| iv | dv |
|---|---|
| HIGH | No |
| HIGH | No |
| LOW | No |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| LOW | Yes |
| LOW | Yes |
| LOW | Yes |
Assuming we have a categorical independent variable (IV) and a categorical dependent variable (DV):
| iv | dv |
|---|---|
| HIGH | No |
| HIGH | No |
| LOW | No |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| LOW | Yes |
| LOW | Yes |
| LOW | Yes |
Start by calculating the number of observations with each value of each category:
| iv | dv |
|---|---|
| HIGH | No |
| HIGH | No |
| LOW | No |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| HIGH | Yes |
| LOW | Yes |
| LOW | Yes |
| LOW | Yes |
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 | 2 |
| Yes | 3 | 4 |
| Total | 4 | 6 |
Then, calculate the proportion/percentage of observations among each value of the IV.
If the independent variable is in the columns, then the columns should sum to 100%.
If the independent variable is in the rows, then the rows should sum to 100%.
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 | 2 |
| Yes | 3 | 4 |
| Total | 4 | 6 |
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 (25%) | 2 (33%) |
| Yes | 3 (75%) | 4 (67%) |
| Total | 4 | 6 |
Look at what happens to the DV at different values of the IV. If your variables are ordinal, you should be able to identify a direction of the effect.
The proportion of “Yes” values decreases as the IV goes from lower to higher, so this is a negative or inverse relationship.
iv
|
||
|---|---|---|
| dv | LOW | HIGH |
| No | 1 (25%) | 2 (33%) |
| Yes | 3 (75%) | 4 (67%) |
| Total | 4 | 6 |
Using a bar graph or line graph can make these relationships easier to spot:
Key rule: always calculate percentages or proportions by categories of the independent variable.
If one or both variables are interval-level, you can bin them in order to use them in a cross tab. For instance, you could separate an interval like into a series of age ranges.
Hypothesis: in a comparison of individuals, independents are less likely to turn out to vote compared to people who support one party or another.
How should I calculate proportions here?
Party ID
|
|||
|---|---|---|---|
| turnout2020 | Democrat | Independent | Republican |
| 0. Did not vote | 335 | 316 | 382 |
| 1. Voted | 3160 | 560 | 2714 |
Are these results generally consistent with my hypothesis?
Party ID
|
|||
|---|---|---|---|
| turnout2020 | Democrat | Independent | Republican |
| 0. Did not vote | 335 (10%) | 316 (36%) | 382 (12%) |
| 1. Voted | 3160 (90%) | 560 (64%) | 2714 (88%) |
If we think of party ID as an ordered variable, this is a curvilinear relationship.
What happens if I calculate % among the values of the DV?
Here’s the relationship between education and voter turnout with % calculated on education level:
Education
|
|||||
|---|---|---|---|---|---|
| turnout2020 | 1. Less than high school credential | 2. High school credential | 3. Some post-high school, no bachelor's degree | 4. Bachelor's degree | 5. Graduate degree |
| 0. Did not vote | 130 (41%) | 286 (24%) | 380 (15%) | 135 (7%) | 91 (6%) |
| 1. Voted | 185 (59%) | 883 (76%) | 2148 (85%) | 1749 (93%) | 1388 (94%) |
| Note: | |||||
| Column % in parentheses | |||||
The results suggest a positive or direct relationship: as education increases, so does the % turnout.
What happens if I calculate % among the values of the DV?
Here’s the relationship between education and voter turnout with % calculated across voter turnout
Education
|
|||||
|---|---|---|---|---|---|
| turnout2020 | 1. Less than high school credential | 2. High school credential | 3. Some post-high school, no bachelor's degree | 4. Bachelor's degree | 5. Graduate degree |
| 0. Did not vote | 130 (13%) | 286 (28%) | 380 (37%) | 135 (13%) | 91 (9%) |
| 1. Voted | 185 (3%) | 883 (14%) | 2148 (34%) | 1749 (28%) | 1388 (22%) |
| Note: | |||||
| Row % in parentheses | |||||
Here, the results can give the misleading impression that there’s a curvilinear relationship: turnout drops off for Bachelor’s Degrees and above.
Either of these tables might be a valid way to look at these data, but they answer slightly different questions:
If I want to compare turnout at different levels of education, then I need to calculate % turnout among people with different levels of education.
If I want to compare education among voters and non-voters, then I need to calculate % education among people who voted and didn’t vote.
Which variable is the IV or DV is sometimes a theoretical question, but in this case its unlikely that voting is causing people to become more educated, so it probably doesn’t make sense to calculate percentages by voting vs. non-voting.
When we have interval level outcome and a categorical independent variable, we can group each observation by values of the IV and then calculate the mean across each group.
For instance I want to examine the relationship between national wealth and carbon emissions. My hypothesis is that wealthier nations will have more emissions compared to poorer nations.
| country | gdp.percap.5cat | co2.percap |
|---|---|---|
| Afghanistan | 1. $3k or less | 0.281803 |
| Albania | 3. $10k to $25k | 1.936486 |
| Algeria | 3. $10k to $25k | 3.988271 |
| Angola | 2. $3k to $10k | 1.194668 |
| Argentina | 3. $10k to $25k | 3.995881 |
| Armenia | 3. $10k to $25k | 2.030401 |
| Australia | 5. $45k or more | 16.308205 |
| Austria | 5. $45k or more | 7.648816 |
| Azerbaijan | 3. $10k to $25k | 3.962984 |
| Bahrain | 5. $45k or more | 20.934996 |
GDP data has been grouped into five categories, so now I just need to calculate the average of CO2 emissions within each group of the ordinal IV:
| GDP Per capita range | CO2 emissions per capita |
|---|---|
| 1. $3k or less | 0.3128312 |
| 2. $3k to $10k | 1.2680574 |
| 3. $10k to $25k | 4.4065669 |
| 4. $25k to $45k | 8.0307610 |
| 5. $45k or more | 12.3134306 |
Is this generally consistent with expectations?
Here again, the relationship can be easier to conceptualize if we plot it.
A relationship like this will rarely be perfectly straight, so “linearity” and “curvilinearity” are partly a matter of degree, but there are some cases where there is a clear “U” shape to the relationship:
| iv | dv |
|---|---|
| 1. Extremely liberal | 6.314 |
| 2. Liberal | 5.685 |
| 3. Slightly liberal | 5.001 |
| 4. Moderate; middle of the road | 4.651 |
| 5. Slightly conservative | 4.636 |
| 6. Conservative | 4.974 |
| 7. Extremely conservative | 5.363 |
How can we distinguish correlation from causation?
This process inevitably requires us to consider rival explanations for an observed relationship:
What I want to show is that Fox News viewership is cases a decreased chance of getting a Covid vaccine.
There’s a correlation, but I’m concerned this relationship is spurious because I know that things like existing political views are already correlated with media consumption, and those might explain any correlation I see here:
Its possible that this difference in ideology accounts for the entire observed correlation between media habits and vaccines. I can’t really rule this possibility out without further investigation.
What if I could randomly assign people to watch Fox News? Random assignment would ensure that nothing is correlated with Fox news viewership.
Ideology may still matter for getting a vaccine, but since if conservatism is randomly distributed between social media users and non-users, it no longer confounds the observed relationship.
Experiments use random assignment to account for rival explanations. If you randomly assign people to receive a “treatment”, then you can ensure that there is no confounding because nothing is correlated with your IV.
The classic examples are in medicine:
Group A is randomly assigned to receive a placebo (the control group)
Group B is randomly assigned to receive a medicine (the treatment group)
After a certain period of time, we compare the outcomes for both groups.
Differences between the groups can be attributed to the effect of the treatment (+/- some random sampling error)
Experiments are considered a “gold standard” because they can account for all kinds of confounding, including confounding caused by unobserved or unexpected relationships.
However, they have two key limitations:
External validity: results in the lab may not easily translate to results in real life.
Feasibility: many interesting questions just can’t be randomly assigned. We can assign “democracy” or “war” or “religion” to people.
Field experiments can lessen the external validity problem by using random assignment in the field.
For instance, one common way to study GOTV messaging is to randomly select households to receive mailers:
Civic duty treatment
Hawthorne treatment
Neighbors treatment
Self treatment
From: GERBER, A. S., GREEN, D. P., & LARIMER, C. W. (2008). Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment. American Political Science Review, 102(1), 33–48. doi:10.1017/S000305540808009X
From: GERBER, A. S., GREEN, D. P., & LARIMER, C. W. (2008). Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment. American Political Science Review, 102(1), 33–48. doi:10.1017/S000305540808009X
Field experiments can face fewer external validity problems, but some things still can’t be experimentally manipulated.
Natural experiments use “quasi” randomization or “randomization by nature” where treatments are assigned more-or-less randomly.
Viewing Fox News isn’t random, but areas where Fox News is lower in the channel order will have more viewers.
Channel order is essentially randomly assigned.
So, using channel order as a “treatment” assignment might theoretically allow us to account for confounding in an observational setting.
Other sources of quasi randomization include:
Lotteries (like the Vietnam Draft, or the literal lottery)
Arbitrary cutoffs (barely winning an election vs. barely losing)
Natural disasters and weather events
Still, natural experiments require a mixture of creativity and luck. They’re not available for most questions.
Experimental Research: “treatment” (the independent variable) is randomly assigned by the researcher in order to identify cause and effect relationships.
Quasi-Experiments: the independent variable is “randomly” assigned, but not by the researcher. For instance: a policy that is distributed by a random lottery.
Observational Research: nothing is randomly assigned, at least not by researchers. Phenomena are observed in the real world. This is easier to implement but its much harder distinguish correlations from true causation because lots of things are non-random.
This category includes qualitative methods, but we’re focusing mostly on quantitative (large-N) methods.
In quantitative studies of countries or states, we might not worry much about questions of “representativeness” because we can collect data on all of the countries or states.
But in quantitative research on people, its typically not feasible to study everyone. We’ll need to sample and then make generalizations.
A true population census means studying everyone, or nearly everyone. This is rarely feasible except with a lot of resources!
Survey research aims to find a representative sample of the population and then make inferences about the population.
Literary Digest wanted to predict the outcome of the 1936 presidential election.
Mailed out approx. 10 million “ballots” based on address data from social clubs, automobile registrations, phone books etc. about 2.4 million were returned.
10 million much larger than the target for most polls
A 24% response rate was low for the time, but higher than contemporary polls
Scientific polling was in its infancy, Literary Digest touted their lack of any fancy data manipulation as an advantage over polls that used more complex methodologies.
| Prediction for Landon | Prediction for Roosevelt |
| 54% of the popular vote | 41% of the popular vote |
Not quite!
Contemporaneous analyses attributed the error to sampling bias: Literary Digest targeted car owners and telephone owners, and so the results skewed toward the wealthy
Subsequent reanalyses point to non-response bias: Roosevelt supporters were systematically less likely to return the postcard compared to Landon supporters.
Anecdotal evidence suggests that this may have been the result of different levels of enthusiasm: Landon voters really didn’t like Roosevelt and they were motivated to talk about it.
Sample size doesn’t fix bias! Much smaller random polls can easily outperform a large-yet-biased one.
A note that’s still relevant today: non-response bias is a big risk! Some people are more likely to take polls.
Contemporary polls often get stuff wrong, but many general election pollsters are able to get a lot closer with far fewer observations (even as conditions get worse!) How?
Simple Random Samples: get a list of the entire target population, and start selecting at random.
Pros: we’ll converge toward representativeness +/- some random error. As we approach sample sizes of 1,000 or more, the probability of a very large random error becomes negligible.
Cons:
Where am I supposed to get a list of the entire population!?
Can be inefficient: we need a lot of data to get the margin of error to an acceptable level
What if I want to study public opinion among a small group? Say: convicted felons, LGBTQ people, or millionaires?
Stratified Sampling divides the population into groups and then randomly samples within those groups.
Pros: Ensuring representativeness isn’t left entirely to chance, and I can even oversample certain groups if they’re hard to reach or rare in the population.
Cons: this isn’t pure random sampling, so we will need to re-weight the data to make it look like the actual population and account for this when calculating our margin of error.
Cluster Sampling: Instead of targeting individuals, I might target a geographic area like a household, or a city block, or a census tract.
Pros: Potentially much easier to target a random sample of something like households.
Cons: again, not a random sample, so we may need to do some re-weighting and account for this when calculating our margin of error.
Contemporary scientific polls like the ANES are typically stratified and clustered, mostly because of cost-efficiency: all else equal its much cheaper to get a representative sample using stratification and clustering.
Contemporary polls still have to contend with the problem of response bias as well. Some groups are systematically less likely to respond to polls.
But this is why you’ll often need to use weights when working with NES data: weights allow us to make a non-representative sample approximate a representative sample.
| race | N | wtd N |
|---|---|---|
| 1. White, non-Hispanic | 5963 (73%) | 5383 (66%) |
| 2. Black, non-Hispanic | 726 (9%) | 935 (11%) |
| 3. Hispanic | 762 (9%) | 1108 (14%) |
| 4. Asian or Native Hawaiian/other Pacific Islander, non-Hispanic alone | 284 (3%) | 325 (4%) |
| 5. Native American/Alaska Native or other race, non-Hispanic alone | 172 (2%) | 152 (2%) |
| 6. Multiple races, non-Hispanic | 271 (3%) | 296 (4%) |
The basic idea of weighting is simple: if you have twice as many white college educated respondents as you would expect for a representative sample, then you just make each white college educated response count as 1/2 an observation. We call this inverse probability weighting.
In practice, this can be complicated because we generally want to weight for lots of characteristics at once, or make inferences about target populations (ex: likely voters) whose actual size is not known beforehand. Differences of opinion in how to weight things accounts for a lot of the systematic differences across public opinion surveys that ask the same questions.
Something to keep in mind: weighting can go very wrong! Differences of opinion in how to weight things accounts for a lot of the systematic differences across public opinion surveys that ask the same questions.
Weighting can’t solve everything. Its much easier to account for things like small demographic differences compared to more complex problems like differential non-response. We know how many 18-25 year olds there are, but we don’t always know the number of people who are vulnerable to social desirability bias.
Snowball Sampling is a method of surveying very small or hard-to-reach populations. Like surveys of the homeless, or drug users, or experts on medieval history. It works like a chain letter:
Survey some number of known members
Ask those members to give you the contact information for other members
Repeat N times.
There are no guarantees that a Snowball Sample won’t have the Literary Digest problem, its only viable as a research strategy because some groups are so hard to reach that they just can’t be studied any other way.
Convenience Samples are non-representative samples that simply target a population that is easy to access. (College students are a classic example)
Convenience Samples are not representative and don’t really aim to be. Instead, they’re often to used to do things like quickly test out a survey or experiment. Its unlikely they can be generalized to the population.
Our class survey will be something like a mixture of snowball and convenience sampling, so we’re won’t have representative data. We’ll ignore this problem, but its definitely something to keep in mind when assessing real-world polls!
Data on countries or U.S. states seemingly aren’t really samples at all. So we don’t typically have to worry as much about issues of bias, but random error and bias will still come up.
Some sources of bias for cross-national data:
Countries may not keep complete records.
Some sources may disagree on whether some countries exist.
Some national statistics may be based on surveys that have their own biases.
For our purposes, we’ll treat data sets like the “states” and “world” data sets included with your workbook as representative samples from … a potentially infinite population of states or countries.
Differential non-response is still a problem, and it can vary from election to election. But its less of an issue for issue-polling (the stuff in the NES) compared to election polling.
Size doesn’t necessarily mean more accurate results. Pay attention to things like transparency of methods and the weighting strategy, and be wary of making inferences from a single poll result (polling averages tend to be more reliable)
Estimating a proportion is easier than predicting an outcome. In a closely contested election, a very small miss can completely change people’s expectations about the results.
The margin of error is probably larger than what is reported because the margin only accounts for random sampling error, not bias.